Overview

Dataset statistics

Number of variables9
Number of observations1030
Missing cells0
Missing cells (%)0.0%
Duplicate rows25
Duplicate rows (%)2.4%
Total size in memory72.5 KiB
Average record size in memory72.1 B

Variable types

NUM9

Reproduction

Analysis started2020-08-29 03:16:07.054176
Analysis finished2020-08-29 03:16:27.205297
Duration20.15 seconds
Software versionpandas-profiling v2.9.0rc1
Download configurationconfig.yaml

Warnings

Dataset has 25 (2.4%) duplicate rows Duplicates
slag has 471 (45.7%) zeros Zeros
ash has 566 (55.0%) zeros Zeros
superplastic has 379 (36.8%) zeros Zeros

Variables

cement
Real number (ℝ≥0)

Distinct count278
Unique (%)27.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean281.168
Minimum102
Maximum540
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB

Quantile statistics

Minimum102
5-th percentile143.745
Q1192.375
median272.9
Q3350
95-th percentile480
Maximum540
Range438
Interquartile range (IQR)157.625

Descriptive statistics

Standard deviation104.506
Coefficient of variation (CV)0.371687
Kurtosis-0.520652
Mean281.168
Median Absolute Deviation (MAD)79.4
Skewness0.509481
Sum289603
Variance10921.6
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
425201.9%
 
362.6201.9%
 
251.4151.5%
 
446141.4%
 
310141.4%
 
475131.3%
 
331131.3%
 
250131.3%
 
387121.2%
 
349121.2%
 
Other values (268)88485.8%
 
ValueCountFrequency (%) 
10240.4%
 
108.340.4%
 
11640.4%
 
122.640.4%
 
13220.2%
 
ValueCountFrequency (%) 
54090.9%
 
531.350.5%
 
52810.1%
 
52570.7%
 
52220.2%
 

slag
Real number (ℝ≥0)

ZEROS

Distinct count185
Unique (%)18.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.8958
Minimum0
Maximum359.4
Zeros471
Zeros (%)45.7%
Memory size8.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median22
Q3142.95
95-th percentile236
Maximum359.4
Range359.4
Interquartile range (IQR)142.95

Descriptive statistics

Standard deviation86.2793
Coefficient of variation (CV)1.16758
Kurtosis-0.508175
Mean73.8958
Median Absolute Deviation (MAD)22
Skewness0.800717
Sum76112.7
Variance7444.12
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
047145.7%
 
189302.9%
 
106.3201.9%
 
24141.4%
 
20121.2%
 
145111.1%
 
19101.0%
 
98.1101.0%
 
2280.8%
 
2680.8%
 
Other values (175)43642.3%
 
ValueCountFrequency (%) 
047145.7%
 
1140.4%
 
13.650.5%
 
1550.5%
 
17.210.1%
 
ValueCountFrequency (%) 
359.420.2%
 
342.120.2%
 
316.120.2%
 
305.340.4%
 
290.220.2%
 

ash
Real number (ℝ≥0)

ZEROS

Distinct count156
Unique (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.1883
Minimum0
Maximum200.1
Zeros566
Zeros (%)55.0%
Memory size8.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3118.3
95-th percentile167
Maximum200.1
Range200.1
Interquartile range (IQR)118.3

Descriptive statistics

Standard deviation63.997
Coefficient of variation (CV)1.18101
Kurtosis-1.32875
Mean54.1883
Median Absolute Deviation (MAD)0
Skewness0.537354
Sum55814
Variance4095.62
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
056655.0%
 
118.3201.9%
 
141161.6%
 
24.5151.5%
 
79141.4%
 
94131.3%
 
100.4111.1%
 
100.5101.0%
 
98.8101.0%
 
174.2101.0%
 
Other values (146)34533.5%
 
ValueCountFrequency (%) 
056655.0%
 
24.5151.5%
 
5910.1%
 
6010.1%
 
7110.1%
 
ValueCountFrequency (%) 
200.110.1%
 
20010.1%
 
19530.3%
 
194.910.1%
 
19410.1%
 

water
Real number (ℝ≥0)

Distinct count195
Unique (%)18.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean181.567
Minimum121.8
Maximum247
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB

Quantile statistics

Minimum121.8
5-th percentile146.1
Q1164.9
median185
Q3192
95-th percentile228
Maximum247
Range125.2
Interquartile range (IQR)27.1

Descriptive statistics

Standard deviation21.3542
Coefficient of variation (CV)0.11761
Kurtosis0.122082
Mean181.567
Median Absolute Deviation (MAD)13
Skewness0.0746284
Sum187014
Variance456.003
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
19211811.5%
 
228545.2%
 
185.7464.5%
 
203.5363.5%
 
186282.7%
 
162201.9%
 
164.9201.9%
 
153.5151.5%
 
185151.5%
 
178141.4%
 
Other values (185)66464.5%
 
ValueCountFrequency (%) 
121.850.5%
 
126.650.5%
 
12710.1%
 
127.310.1%
 
137.850.5%
 
ValueCountFrequency (%) 
24710.1%
 
246.910.1%
 
23710.1%
 
236.710.1%
 
228545.2%
 

superplastic
Real number (ℝ≥0)

ZEROS

Distinct count111
Unique (%)10.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.20466
Minimum0
Maximum32.2
Zeros379
Zeros (%)36.8%
Memory size8.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median6.4
Q310.2
95-th percentile16.055
Maximum32.2
Range32.2
Interquartile range (IQR)10.2

Descriptive statistics

Standard deviation5.97384
Coefficient of variation (CV)0.962799
Kurtosis1.41127
Mean6.20466
Median Absolute Deviation (MAD)5.3
Skewness0.907203
Sum6390.8
Variance35.6868
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
037936.8%
 
11.6373.6%
 
8272.6%
 
7191.8%
 
6171.7%
 
9161.6%
 
8.9161.6%
 
7.8161.6%
 
9.9161.6%
 
10151.5%
 
Other values (101)47245.8%
 
ValueCountFrequency (%) 
037936.8%
 
1.740.4%
 
1.910.1%
 
210.1%
 
2.210.1%
 
ValueCountFrequency (%) 
32.250.5%
 
28.250.5%
 
23.450.5%
 
22.110.1%
 
2260.6%
 

coarseagg
Real number (ℝ≥0)

Distinct count284
Unique (%)27.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean972.919
Minimum801
Maximum1145
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB

Quantile statistics

Minimum801
5-th percentile842
Q1932
median968
Q31029.4
95-th percentile1104
Maximum1145
Range344
Interquartile range (IQR)97.4

Descriptive statistics

Standard deviation77.754
Coefficient of variation (CV)0.0799182
Kurtosis-0.599016
Mean972.919
Median Absolute Deviation (MAD)46.3
Skewness-0.0402197
Sum1.00211e+06
Variance6045.68
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
932575.5%
 
852.1454.4%
 
944.7302.9%
 
968292.8%
 
1125242.3%
 
967191.8%
 
1047191.8%
 
974121.2%
 
942121.2%
 
822121.2%
 
Other values (274)77174.9%
 
ValueCountFrequency (%) 
80140.4%
 
801.110.1%
 
801.410.1%
 
81120.2%
 
81410.1%
 
ValueCountFrequency (%) 
114510.1%
 
1134.350.5%
 
113010.1%
 
1125242.3%
 
1124.420.2%
 

fineagg
Real number (ℝ≥0)

Distinct count302
Unique (%)29.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean773.58
Minimum594
Maximum992.6
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB

Quantile statistics

Minimum594
5-th percentile613
Q1730.95
median779.5
Q3824
95-th percentile898.09
Maximum992.6
Range398.6
Interquartile range (IQR)93.05

Descriptive statistics

Standard deviation80.176
Coefficient of variation (CV)0.103643
Kurtosis-0.102177
Mean773.58
Median Absolute Deviation (MAD)45.5
Skewness-0.25301
Sum796788
Variance6428.19
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
594302.9%
 
755.8302.9%
 
670232.2%
 
613222.1%
 
801161.6%
 
887.1151.5%
 
746.6151.5%
 
845141.4%
 
712141.4%
 
750121.2%
 
Other values (292)83981.5%
 
ValueCountFrequency (%) 
594302.9%
 
60550.5%
 
611.850.5%
 
61210.1%
 
613222.1%
 
ValueCountFrequency (%) 
992.650.5%
 
94540.4%
 
943.140.4%
 
94240.4%
 
925.750.5%
 

age
Real number (ℝ≥0)

Distinct count14
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.6621
Minimum1
Maximum365
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB

Quantile statistics

Minimum1
5-th percentile3
Q17
median28
Q356
95-th percentile180
Maximum365
Range364
Interquartile range (IQR)49

Descriptive statistics

Standard deviation63.1699
Coefficient of variation (CV)1.38342
Kurtosis12.169
Mean45.6621
Median Absolute Deviation (MAD)21
Skewness3.26918
Sum47032
Variance3990.44
MonotocityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%) 
2842541.3%
 
313413.0%
 
712612.2%
 
56918.8%
 
14626.0%
 
90545.2%
 
100525.0%
 
180262.5%
 
91222.1%
 
365141.4%
 
Other values (4)242.3%
 
ValueCountFrequency (%) 
120.2%
 
313413.0%
 
712612.2%
 
14626.0%
 
2842541.3%
 
ValueCountFrequency (%) 
365141.4%
 
36060.6%
 
270131.3%
 
180262.5%
 
12030.3%
 

strength
Real number (ℝ≥0)

Distinct count845
Unique (%)82.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.818
Minimum2.33
Maximum82.6
Zeros0
Zeros (%)0.0%
Memory size8.0 KiB

Quantile statistics

Minimum2.33
5-th percentile10.961
Q123.71
median34.445
Q346.135
95-th percentile66.802
Maximum82.6
Range80.27
Interquartile range (IQR)22.425

Descriptive statistics

Standard deviation16.7057
Coefficient of variation (CV)0.466407
Kurtosis-0.313725
Mean35.818
Median Absolute Deviation (MAD)10.93
Skewness0.416977
Sum36892.5
Variance279.082
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
33.460.6%
 
79.340.4%
 
41.0540.4%
 
71.340.4%
 
35.340.4%
 
23.5240.4%
 
31.3540.4%
 
77.340.4%
 
37.2730.3%
 
55.930.3%
 
Other values (835)99096.1%
 
ValueCountFrequency (%) 
2.3310.1%
 
3.3210.1%
 
4.5710.1%
 
4.7810.1%
 
4.8310.1%
 
ValueCountFrequency (%) 
82.610.1%
 
81.7510.1%
 
80.210.1%
 
79.9910.1%
 
79.410.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

cementslagashwatersuperplasticcoarseaggfineaggagestrength
0141.3212.00.0203.50.0971.8748.52829.89
1168.942.2124.3158.310.81080.8796.21423.51
2250.00.095.7187.45.5956.9861.22829.22
3266.0114.00.0228.00.0932.0670.02845.85
4154.8183.40.0193.39.11047.4696.72818.29
5255.00.00.0192.00.0889.8945.09021.86
6166.8250.20.0203.50.0975.6692.6715.75
7251.40.0118.3188.56.41028.4757.75636.64
8296.00.00.0192.00.01085.0765.02821.65
9155.0184.0143.0194.09.0880.0699.02828.99

Last rows

cementslagashwatersuperplasticcoarseaggfineaggagestrength
1020183.9122.60.0203.50.0959.2800.0710.79
1021203.5305.30.0203.50.0963.4630.039.56
1022144.80.0133.6180.811.1979.5811.52813.20
1023141.3212.00.0203.50.0971.8748.5710.39
1024297.20.0117.5174.89.51022.8753.5321.91
1025135.00.0166.0180.010.0961.0805.02813.29
1026531.30.00.0141.828.2852.1893.7341.30
1027276.4116.090.3179.68.9870.1768.32844.28
1028342.038.00.0228.00.0932.0670.027055.06
1029540.00.00.0173.00.01125.0613.0752.61

Duplicate rows

Most frequent

cementslagashwatersuperplasticcoarseaggfineaggagestrengthcount
1362.6189.00.0164.911.6944.7755.8335.304
3362.6189.00.0164.911.6944.7755.82871.304
4362.6189.00.0164.911.6944.7755.85677.304
5362.6189.00.0164.911.6944.7755.89179.304
2362.6189.00.0164.911.6944.7755.8755.903
6425.0106.30.0153.516.5852.1887.1333.403
7425.0106.30.0153.516.5852.1887.1749.203
8425.0106.30.0153.516.5852.1887.12860.293
9425.0106.30.0153.516.5852.1887.15664.303
10425.0106.30.0153.516.5852.1887.19165.203